Journal of NeuroEngineering and Rehabilitation — Latest Matching Preprints

1

Generic versus personalized foot-ground contact models for predictive simulations of walking: Is personalization worth the effort?

Williams, S. T.; Li, G.; Fregly, B. J.

2026-04-21 bioengineering 10.64898/2026.04.16.719049 medRxiv

Top 0.1%

18.4%

Show abstract

PurposeQuantification of walking function, including joint motions, ground reactions, and joint loads, outside the lab is a growing research area. Because only joint motions can currently be measured outside the lab, researchers are utilizing tracking optimizations of walking to estimate associated ground reactions and inverse dynamic joint loads. However, foot-ground contact models used in such optimizations have been generic rather than personalized, which may limit the accuracy of estimated ground reactions and joint loads. This study compares the predictive capabilities of generic versus personalized foot-ground contact models. MethodsGeneric and personalized foot-ground contact models were evaluated in calibration and tracking optimizations performed using experimental walking data collected from three subjects in varying states of health. Foot-only calibration optimizations evaluated how well both models could reproduce experimental ground reaction and foot motion data while tracking both types of data simultaneously, while whole-body tracking optimizations evaluated how well both models could reproduce experimental ground reactions, joint motion, and joint load data while tracking only experimental joint motion data and achieving dynamic consistency. ResultsFor all three subjects and both types of optimizations, personalized foot-ground contact models reproduced experimental ground reaction, joint motion, and joint load data more accurately than generic foot-ground contact models. ConclusionPersonalized foot-ground contact models can improve the accuracy with which ground reactions and joint loads can be estimated via tracking optimizations of walking using only experimental motion data as inputs. Personalized models require little time and effort to calibrate using freely available software tools and should improve the accuracy of predictive simulations of walking as well.

2

Vision Language Model for Coronary Angiogram Analysis and Report Generation: Development and Evaluation Study

Jiang, Q.; Ke, Y.; Sinisterra, L. G.; Elangovan, K.; Li, Z.; Yeo, K. K.; Jonathan, Y.; Ting, D. S. W.

2026-04-21 cardiovascular medicine 10.64898/2026.04.19.26351241 medRxiv

Top 0.6%

1.5%

Show abstract

Coronary artery disease is a leading cause of morbidity and mortality. Invasive coronary angiography is currently the gold standard in disease diagnosis. Several studies have attempted to use artificial intelligence (AI) to automate their interpretations with varying levels of success. However, most existing studies cannot generate detailed angiographic reports beyond simple classification or segmentation. This study aims to fine-tune and evaluate the performance of a Vision-Language Model (VLM) in coronary angiogram interpretation and report generation. Using twenty-thousand angiogram keyframes of 1987 patients collated across four unique datasets, we finetuned InternVL2-4B model with Low-Rank Adaptor weights that can perform stenosis detection, anatomy labelling, and report generation. The fine-tuned VLM achieved a precision of 0.56, recall of 0.64, and F1-score of 0.60 for stenosis detection. In anatomy segmentation, it attained a weighted precision of 0.50, recall of 0.43, and F1-score of 0.46, with higher scores in major vessel segments. Report generation integrating multiple angiographic projection views yielded an accuracy of 0.42, negative predictive value of 0.58 and specificity of 0.52. This study demonstrates the potential of using VLM to streamline angiogram interpretation to rapidly provide actionable information to guide management, support care in resource-limited settings, and audit the appropriateness of coronary interventions. AUTHOR SUMMARYCoronary artery disease has heavy disease burden worldwide and coronary angiogram is the gold standard imaging for its diagnosis. Interpreting these complex images and producing clinical reports require significant expertise and time. In this study, we fine-tuned and investigated an open-source VLM, InternVL2-4B, to interpret and report coronary angiogram images in key tasks including stenosis detection, anatomy identification, as well as full report generation. We also referenced the fine-tuned InternVL2-4B against state-of-the-art segmentation model, YOLOv8x, which was evaluated on the same test sets. We examined how machine learning metrics like the intersection over union score may not fully capture the clinical accuracy of model predictions and discussed the limitations of relying solely on these metrics for evaluating clinical AI systems. Although the model has not yet achieved expert-level interpretation, our results demonstrate the potential and feasibility of automating the reporting of coronary angiograms. Such systems could potentially assist cardiologists by improving reporting efficiency, highlightning lesions that may require review, and enabling automated calculations of clinical scores such as the SYNTAX score.

3

When Noise Isnt Simply Noise: Deterministic Postural Drive During Noisy Galvanic Vestibular Stimulation (nGVS)

Rice, D.; Dakin, C. J.; Ewer, M.; Hannan, K. B.

2026-04-22 neuroscience 10.64898/2026.04.20.719310 medRxiv

Top 0.7%

1.0%

Show abstract

Age- and disease-related vestibular decline can cause dizziness and postural instability, motivating interventions such as noisy galvanic vestibular stimulation (nGVS). nGVS is commonly delivered at "subsensory" amplitudes and explained by stochastic resonance, yet because galvanic stimulation directly modulates vestibular afferents, even imperceptible currents may also exert deterministic effects on balance. This study examined whether low-amplitude nGVS (<1 mA), as typically used in stochastic resonance paradigms, directly influences postural behavior through stimulus-response coupling. Twenty healthy young adults stood on a force plate with feet together and eyes closed on either a rigid surface or 10-cm foam. In randomized order, they completed 300-second trials with band-limited (0-30 Hz), zero-mean nGVS at {+/-}0, 0.1, 0.2, 0.3, 0.5, and 0.7 mA. Coupling between the stimulation waveform and mediolateral ground-reaction force was assessed using coherence and time-cumulant density. Mean coherence was significant mainly at higher amplitudes (0.5-0.7 mA) on both surfaces, whereas time-cumulant density identified significant time-locked vestibular-evoked response components at much lower amplitudes, down to 0.1 mA. These included an early response around 135-155 ms and a later, prominent response around 360-410 ms. Individually, significant coherence was common at 0.5-0.7 mA (15-19 of 20 participants), while cumulant-based responses appeared in some participants even at 0.1 mA. Responses were clearer on foam, consistent with greater vestibular reliance when somatosensory input is less reliable. Overall, low-amplitude nGVS can entrain postural output, suggesting that balance changes during "subsensory" stimulation may reflect both stochastic-resonance-like effects and deterministic vestibular drive, underscoring the need to quantify coupling alongside performance outcomes.

4

Wearable Dual-Modality Plethysmography for Arterial Modulation and Blood Pressure Dip

Jung, S.; Thomson, S.

2026-04-21 physiology 10.64898/2026.04.17.719282 medRxiv

Top 0.8%

0.8%

Show abstract

Continuous, non-invasive cardiovascular monitoring is limited by the superficial sensing depth of Photoplethysmography (PPG), which is susceptible to peripheral artifacts. This study evaluates a wearable dual-modality prototype integrating dryelectrode Impedance Plethysmography (IPG) and PPG within a smartwatch form factor. Results from a pilot study (N=2) demonstrate that IPG signals exhibit a temporal lead over PPG across ventral and dorsal sites, supporting its greater penetration depth. During brachial artery modulation, IPG showed superior sensitivity to arterial recovery on the ventral forearm. Furthermore, 60-minute napping sessions revealed that while PPG remained morphologically stable, IPG signals underwent significant evolution, capturing distinct pulsewave archetypes. These findings suggest that wearable IPG provides a high-fidelity window into deep systemic hemodynamics typically reserved for clinical instrumentation.

5

Decision Curve Analysis for Evaluating Machine Learning Models for Next-Day Transfer Out of ICU

Pozo, M.; Pape, A.; Locke, B.; Pettine, W. W.

2026-04-21 health informatics 10.64898/2026.04.19.26351213 medRxiv

Top 0.9%

0.8%

Show abstract

Timely identification of intensive care unit (ICU) patients likely to exit the unit can support anticipatory workflows such as chart review, eligibility screening, and patient outreach prior to transfer. Most ICU discharge prediction studies report discrimination and calibration, but these metrics do not quantify the decision consequences of acting on predictions. Using adult ICU admissions from MIMIC-IV, we represented each ICU stay as a sequence of daily clinical summaries and trained logistic regression, random forest, and XGBoost models to predict next day ICU transfer. Models achieved ROC AUC of 0.80-0.84 with differing calibration. We evaluated decision utility using decision curve analysis (DCA), where positive predictions trigger proactive review. Across thresholds, model guided strategies outperformed review-all, review-none, and a simple clinical rule. To translate net benefit into implementable operations, we modeled a clinical trial recruitment workflow with an 8 hour daily time constraint, incorporating chart review and consent effort. At a feasible operating threshold (0.23), the model flagged [~]23 charts/day and yielded [~]1.23 enrollments/day under conservative eligibility and consent assumptions. These results demonstrate that DCA provides a transparent framework for determining when ICU transfer predictions are worth using and how thresholds should be selected to align with real world workflow constraints. Data and Code AvailabilityThis research has been conducted using data from MIMIC-IV. Researchers can request access via PhysioNet. Implementation code is available upon request.

6

Assessing ageing, cognitive ability and freezing of gait in Parkinson's disease through integrated brain-heart network dynamics

Pitti, L.; Sitti, G.; Candia-Rivera, D.

2026-04-23 neurology 10.64898/2026.04.22.26351482 medRxiv

Top 1.0%

0.7%

Show abstract

Parkinson's Disease (PD) is a complex neurodegenerative disorder that manifests through systemic, large-scale physiological reorganizations. While research often focuses on region-specific neural changes, there is a growing need for multidomain approaches to capture the complexity of the disease and its clinical heterogeneity. This study proposes an analytical pipeline to evaluate Brain-Heart Interplay (BHI) as a novel systemic biomarker for neurodegeneration and healthy ageing. In this study we assessed BHI across three open-source datasets (EEG and ECG signals). We compared Healthy Young, Healthy Elderly, and PD patients in resting state to investigate the effects of ageing and cognitive performance. Additionally, we studied BHI trends in PD patients in the moment of freezing of gait (FOG). Methodologically, brain network organization was quantified using coherence-based EEG connectivity and graph theory, while heart activity was analyzed through Poincare plot-derived measures of cardiac autonomic activity. The coupling between these two systems was measured using the Maximal Information Coefficient to capture linear and non-linear dependencies between global cortical organization and cardiac autonomic outflow. The results demonstrate that BHI is a sensitive biomarker for detecting early multisystem dysfunction in both neurodegeneration and ageing. Furthermore, the identification of specific BHI trends during FOG onset suggests new opportunities for understanding the physiological mechanisms driving motor complications in PD. Our proposed pipeline provides a guiding tool for large-scale physiological assessment in clinical research.

7

Trans-Aqueduct Access to the Third Ventricle for Delivery of Medical Devices: A Feasibility Study

Haines, M. H.; Ronayne, S. M.; Pickles, K.; Begg, D. A.; Hurley, P. J.; Ferraccioli, M.; Desmond, P.; Opie, N. L.

2026-04-21 neurology 10.64898/2026.04.14.26348906 medRxiv

Top 1%

0.7%

Show abstract

This research demonstrates that the trans-aqueduct approach is a feasible, minimally invasive access pathway to the third ventricle, offering a potential route to the deep brain for therapeutic technologies. Further pre-clinical investigation is required to thoroughly evaluate physiological tolerance, trauma risk, and the long-term implications of intraventricular implantation. The third ventricle is a high-value site for neuromodulation due to its proximity to deep-brain targets, including the subthalamic nucleus (STN) and globus pallidus internus (GPi). This study defined the anatomical pathway; and evaluated the technical feasibility of retrograde access to the third ventricle via the cerebral aqueduct using minimally invasive interventional techniques. Evaluation was conducted in three phases using human MRI datasets (n=16; mean age 48.4 years) and cadaveric specimens (n=6; mean age 88.2 years). Phase 1 involved morphometric MRI analysis of the aqueduct and ventricles. Phase 2 tested trans-aqueduct access on cadaver specimens via fluoroscopically guided guidewires and catheters. Phase 3 utilized direct anatomical dissections on cadaver specimens (n=3) to morphometrically measure the third ventricular cavity and its relationship to deep-brain nuclei. Measurements across the sample groups showed a mean aqueduct diameter of 1.6 mm (SD=0.14). Third ventricle dimensions averaged 27.6 mm (ventral-dorsal), 19.9 mm (caudal-cranial), and 5.7 mm (lateral). Successful access to the third ventricle was achieved in 83% (5/6) of cadaveric specimens. The optimal technical configuration utilized a 0.018'' angled-tip guidewire and 5-6 Fr catheters; the aqueduct accommodated diameters up to 2.0 mm with minimal resistance. The STN and GPi were localized within 5-20 mm of the ventricular volumetric centroid. The trans-aqueduct approach is a technically feasible, minimally invasive pathway for accessing the third ventricle. This route offers a potential alternative for the delivery of therapeutic neurotechnologies. Further research is required to assess physiological tolerance, trauma risk, and the long-term safety of intraventricular implantation.

8

QRS Detection by Combinatorial Optimization With MLP Assisted Peak Scoring

Hopenfeld, B.

2026-04-22 bioengineering 10.64898/2026.04.19.719501 medRxiv

Top 1%

0.5%

Show abstract

A multiple channel QRS detector is described. The detector partitions raw signal segments into peak domains, extracts parameters associated with the peak domains, and scores peaks based on these parameters. A multi-layer perceptron (MLP) with 11 inputs generates provisional peak scores, which are refined through application of rules involving 20-30 parameters. An optimal sequence of supra threshold peaks is determined. Separately, combinatorial optimization determines an optimal structured heart rhythm sequence. Adjudication between the general supra threshold sequence and the structured sequence depends on noise level, peak quality, and rhythm structure quality. For multiple channel fusion, peak scores are determined as a noise weighted function of channel peak scores. The MLP was trained on approximately 70% of channel 1 of the MIT-BIH Arrhythmia Database. The supplementary rules were heuristically chosen over all channel 1 records. Sensitivity (SE) and positive predictive value (PPV) of the detector applied to channel 2 were a function of the noise threshold used to discard segments. At a noise level that would exclude 2.2% of channel 1 data, the SE and PPV were 99.67% and 99.75% respectively. Importantly, even in high noise, the detector was able to track large scale features of heart rhythm. Fused channel 1 and channel 2 SE and PPV were 99.96% and 99.98% respectively. The present algorithm points the way toward maximal extraction of heart rhythm information from noisy signals, and the potential to reduce false alarms generated by automated rhythm analysis software.

9

The impact of cognitive processes associated with image recognition on visuo-vestibular interaction

Malara, P.; Tosin, A. G.; Castellucci, A.; Martellucci, S.; Musumano, L. B.; Mandala, M.

2026-04-23 otolaryngology 10.64898/2026.04.22.26351361 medRxiv

Top 1%

0.5%

Show abstract

An increasing number of studies highlight the role of saccadic remodulation in compensatory mechanisms following vestibular injury, and the reappearance of SHIMP saccades correlates with symptom improvement measured by the Dizziness Handicap Inventory (DHI). To investigate the influence of attentional processes and working memory on visuo-vestibular interaction, three independent but interrelated experiments were conducted. In the first two experiments, healthy subjects and patients with unilateral or bilateral vestibular deficits underwent vHIT in SHIMP mode and the Functional Head Impulse Test (fHIT), performed first separately and subsequently simultaneously. Mean latency and clustering of SHIMP saccades, together with Landolt C recognition rates, were analyzed. Differences between separate and combined protocols were assessed, and, in patients, correlated with symptom severity measured by the DHI, to determine whether the near-simultaneous execution of tasks mediated by shared parietal cortical substrates influenced performance. In the third experiment, vHIT in HIMP mode and fHIT were performed using separate and combined protocols to evaluate whether recognition-related cognitive load affected recovery saccade latency and clustering. Results suggest that visual recognition modulates visuo-vestibular interaction, supporting integrated dual-task protocols for ecological balance assessment and helping explain clinical discrepancies.

10

Dissecting clinical reasoning failures in frontier artificial intelligence using 10,000 synthetic cases

Auger, S. D.; Varley, J.; Hargovan, M.; Scott, G.

2026-04-23 neurology 10.64898/2026.04.22.26351488 medRxiv

Top 1%

0.4%

Show abstract

Background: Current medical large language model (LLM) evaluations largely rely on small collections of cases, whereas rigorous safety testing requires large-scale, diverse, and complex cases with verifiable ground truth. Multiple Sclerosis (MS) provides an ideal evaluation model, with validated diagnostic criteria and numerous paraclinical tests informing differential diagnosis, investigation, and management. Methods: We generated synthetic MS cases with ground-truth labels for diagnosis, localisation, and management. Four frontier LLMs (Gemini 3 Pro/Flash, GPT 5.2/5 mini) were instructed to analyse cases to provide anatomical localisation, differential diagnoses, investigations, and management plans. An automated evaluator compared these outputs to the ground-truth labels. Blinded subspecialty experts validated 70 cases for realism and automated evaluator accuracy. We then evaluated LLM decision-making across 1,000 cases and scaled to 10,000 to characterise rare, catastrophic failures. Results: Subspecialist expert review confirmed 100% synthetic case realism and 99.8% (95% CI 95.5 to 100) automated evaluation accuracy. Across 1,000 generated MS cases, all LLMs successfully included MS in the differential diagnoses for more than 91% cases. However, diagnostic competence did not associate with treatment safety. Gemini 3 models had low rates of clinically appropriate steroid recommendations (Flash: 7.2% 95% CI 5.6 to 8.8; Pro: 15.8% 95% CI 13.6 to 18.1) compared to GPT 5 mini (23.5% 95% CI 20.8 to 26.1), frequently overlooking contraindications like active infection. OpenAI models inappropriately recommended acute intravenous thrombolysis for MS cases (9.6% GPT 5.2; 6.4% GPT 5 mini) compared to below 1% for Gemini models. Expanded evaluation (to 10,000 cases) probed these errors in detail. Thrombolysis was recommended in 10.1% of cases lacking symptom timing information and paradoxically persisted (2.9%) even when symptoms were explicitly documented as more than 14 days old. Conclusion: Automated expert-level evaluation across 10,000 cases characterised artificial intelligence clinical blind spots hitherto invisible to small-scale testing. Massive-scale simulation and automated interrogation should become standard for uncovering serious failures and implementing safety guardrails before clinical deployment exposes patients to risk.

11

Multimodal Integration of Ambulatory ECG and Clinical Features for Sudden Cardiac Death and Pump Failure Death Prediction

Swee, S.; Adam, I.; Zheng, E. Y.; Ji, E.; Wang, D.; Speier, W.; Hsu, J.; Chang, K.-W.; Shivkumar, K.; Ping, P.

2026-04-22 cardiovascular medicine 10.64898/2026.04.21.26351421 medRxiv

Top 1%

0.3%

Show abstract

Ambulatory electrocardiograms (ECG) provides continuous monitoring of the hearts electrical activity. However, many existing machine learning and artificial intelligence models for analyzing ambulatory ECG traces are often unimodal and do not incorporate patient clinical context. In this study, we propose a multimodal framework integrating ambulatory ECG-derived representations with clinical text embeddings to predict two cardiac outcomes: sudden cardiac death and pump failure death. Ambulatory ECG traces are preprocessed, segmented, and encoded via a multiple instance learning and temporal convolutional neural network framework. In parallel, patient clinical features are parsed into structured prompts, which are passed through a large language model to generate clinical reasoning; this reasoning passes through a biomedical language encoder to generate a text embedding. With the ECG and text embeddings, we systematically evaluate multiple fusion strategies, including concatenation- and gating-based approaches, to integrate these two data modalities. Our results demonstrate that multimodal models consistently outperform unimodal baselines, with adaptive fusion mechanisms providing the greatest improvements in predictive performance. Decision curve analysis highlights the potential clinical utility of the proposed framework for risk stratification. Finally, we visualize model attention across modalities, including ECG attention patterns, segment-level saliency, heart rate variability features, and clinical reasoning, to contextualize patient-specific predictions.

12

MedSafe-Dx (v0): A Safety-Focused Benchmark for Evaluating LLMs in Clinical Diagnostic Decision Support

Van Oyen, C.; Mirza-Haq, N.

2026-04-21 health informatics 10.64898/2026.04.14.26350711 medRxiv

Top 2%

0.3%

Show abstract

MedSafe-Dx (v0), introduces a new safety-focused benchmark for evaluating large language models in clinical diagnostic decision support using a filtered subset of the DDx Plus dataset (N=250). MedSafe-Dx evaluates three dimensions: escalation sensitivity, avoidance of false reassurance, and calibration of uncertainty. Models were tasked with providing a ranked differential (ICD-10), an escalation decision (Urgent vs. Routine), and a confidence flag. Performance was measured via a "Safety Pass Rate," a composite metric penalizing three hard failure modes: missed escalations of life-threatening conditions, overconfident incorrect diagnoses, and unsafe reassurance in ambiguous cases. Eleven models were evaluated and revealed a significant disconnect between diagnostic recall and safety. GPT-5.2 achieved the highest Safety Pass Rate (97.6%), while several models exhibited high rates of missed escalations or unsafe reassurance. MedSafe-Dx provides a robust stress test for identifying high-risk failure modes in diagnostic decision support and shows that high diagnostic accuracy does not guarantee clinical safety. While the benchmark is currently limited by synthetic data and proxy labels, it provides a reproducible, auditable framework for testing AI behavior before clinical deployment. Our findings suggest that interventions such as safety-focused prompting and reasoning-token budgets could be essential components for the safe deployment of LLMs in clinical workflows.

13

Micro-Doppler Radar Identifies Movement Asymmetries After Anterior Cruciate Ligament Reconstruction

Onks, C. A.; Zeng, C.; Creath, R.; Simone, B. D.; Nyland, J. E.; Murphy, T. E.; Kishel, L. A.; Ardat, B. A.; Venezia, V. A.; Wiggins, A. M.; Shaffer, B. R.; Narayanan, R. M.

2026-04-21 sports medicine 10.64898/2026.04.15.26350397 medRxiv

Top 2%

0.3%

Show abstract

BackgroundPatients who have undergone Anterior Cruciate Ligament Reconstruction (ACLR) have a 6-24% chance of either re-tearing or having subsequent knee surgery. To date there have been no practical validated risk prediction models that can be easily implemented into clinical workflow for re-injury risk. Micro-Doppler radar (MDR) provides a promising solution. ObjectiveThe purpose of this study was to investigate the predictive ability of MDR to identify persons with a previous ACLR relative to an age and sex matched healthy control. MethodsACLR patients (n=81) and controls (n=100) performed drop box jump, sit to stand (STS), and walking trials as MDR signatures were collected. A 1D Convolutional Neural Network was developed to evaluate each activity individually followed by the development of a fusion model validation using all three activities. ResultsThe STS model individually achieved the highest overall accuracy of 82.3%, with a sensitivity of 71.6% and specificity of 91.0%. The fusion model using all activities achieved a peak overall accuracy to detect ACLR of 86.2%, 80.3% sensitivity, and 91% specificity. ConclusionsCurrently, there is no clinically validated, efficient approach to objectively evaluate human motion at the point of care. When coupled with machine learning, MDR accurately differentiates ACLR from control groups by identifying complex biomechanical asymmetries, with classification performance comparable to or exceeding that of motion capture. Future research is needed to determine if MDR can be used in conjunction with risk prediction modeling. Key pointsMicro-Doppler radar provides a promising new solution to identify important human motion asymmetries in clinical settings. Here we evaluated a group of patients who have a history of Anterior Cruciate Ligament reconstruction versus a control group. Simple movements performed in the presence of the micro-Doppler radar system were used to identify the 2 groups with accuracy comparable or superior to motion capture systems.

14

GPU-Accelerated Optimization Investigates Synaptic Reorganization Underlying Pathological Beta Oscillations in a Basal Ganglia Network Model

Nakkeeran, K. R.; Anderson, W. S.

2026-04-21 neuroscience 10.64898/2026.04.16.718939 medRxiv

Top 2%

0.3%

Show abstract

ObjectivePathological beta-band oscillations (13 to 30 Hz) in the subthalamic nucleus (STN) are a hallmark of Parkinsons disease and a primary target for deep brain stimulation therapy, yet the specific pattern of synaptic reorganization that drives their emergence remains incompletely understood. We developed a GPU-accelerated computational framework to systematically investigate combinations of synaptic changes across basal ganglia pathways that produce Parkinsonian beta oscillations while satisfying literature-based electrophysiology constraints. ApproachWe implemented a biophysically detailed spiking network model of the STN, external globus pallidus (GPe), and internal globus pallidus (GPi) in JAX (a high-performance numerical computing Python library), achieving a 490-fold speedup over conventional CPU-based simulation. Using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) we optimized 10 network parameters across two stages: first establishing a healthy baseline matching primate electrophysiology data, then searching within biologically motivated bounds for synaptic modifications that reproduce Parkinsonian firing rates and beta power. Fixed in-degree connectivity ensured optimized parameters produced scale-invariant dynamics from 450 to 45000 neurons. All simulations ran on a single cloud GPU instance at 84 cents per hour. Main ResultsThe optimizer converged on a coordinated pattern of synaptic reorganization dominated by asymmetric changes within the STN-GPe reciprocal loop: STN to GPe excitation increased 2.21-fold while GPe to STN inhibition collapsed to 0.11-fold of its healthy value. STN to GPi and GPe to GPi pathways changed minimally (1.06-fold and 1.45-fold respectively). This configuration transformed asynchronous firing (beta: 0.4 percent of spectral power) into synchronized bursting with prominent beta oscillations (49.4 percent), with firing rate changes matching experimental observations. Network dynamics were invariant across a 100-fold range of network sizes (firing rate deviation less than 2.4 Hz; all metrics p less than 0.001 across 10 random seeds at 45000 neurons). We implemented a simplified deep brain stimulation model for validation purposes, which achieved complete beta suppression (49.4 percent to 0.0 percent) and restored GPi output to healthy levels. SignificanceThese results suggest that pathological beta oscillations emerge from a specific pattern of synaptic reorganization, namely the reduction of GPe inhibitory feedback to STN. The GPU-accelerated optimization framework, running on commodity cloud infrastructure, demonstrates an accessible platform for parameter exploration in neural circuit models and a foundation for generating synthetic training data for adaptive deep brain stimulation algorithms.

15

Large language models and retrieval augmented generation for complex clinical codelists: evaluating performance and assessing failure modes

Matthewman, J.; Denaxas, S.; Langan, S.; Painter, J. L.; Bate, A.

2026-04-24 health informatics 10.64898/2026.04.23.26351098 medRxiv

Top 2%

0.3%

Show abstract

Objectives: Large language models (LLMs) have shown promise in creating clinical codelists for research purposes, a time-consuming task requiring expert domain knowledge. Here, we evaluate the performance and assess failure modes of a retrieval augmented generation (RAG) approach to creating clinical codelists for the large and complex medical terminology used by the Clinical Practice Research Datalink (CPRD). Materials & Methods: We set up a RAG system using a database of word embeddings of the medical terminology that we created using a general-purpose word embedding model (gemini-embedding). We developed 7 reference codelists presenting different challenges and tagged required and optional codes. We ran 168 evaluations (7 codelists, 2 different database subsets, 4 models, 3 epochs each). Scoring was based on the omission of required codes, and inclusion of irrelevant codes. We used model-grading (i.e., grading by another LLM with the reference codelists provided as context) to evaluate the output codelists (a score of 0% being all incorrect and 100% being all correct). Results: We saw varying accuracy across models and codelists, with Gemini 3 Pro (Score 43%) generally performing better than Claude Sonnet 4.6 (36%), Gemini 3 Flash, and OpenAI GPT 5.2 performing worst (14%). Models performed better with shorter target codelists (e.g., Eosinophilic esophagitis with four codes, and Hidradenitis suppurativa with 14 codes). For example, all models consistently failed to produce a complete Wrist fracture codelist (with 214 required codes). We further present evaluation summaries, and failure mode evaluations produced by parsing LLM chat logs. Discussion: Besides demonstrating that a single-shot RAG approach is currently not suitable for codelist generation, we demonstrate failure modes including hallucinations, retrieval failures and generation failures where retrieved codes are not used. Conclusions: Our findings suggest that while RAG systems using current frontier LLMs may create correct clinical codelists in some cases, they still struggle with large and complex terminologies and codelists with a large number of codes. The failure mode we highlight can inform the creation of future workflows to avoid failures.

16

Quantitative Assessment of Dual and Triple Energy Window Scatter Correction in Myocardial Perfusion SPECT with a 4D Phantom

El Bab, M.; Guvenis, A.

2026-04-25 cardiovascular medicine 10.64898/2026.04.17.26351095 medRxiv

Top 2%

0.3%

Show abstract

Conflicting evidence on scatter correction (SC) methods plagues quantitative myocardial perfusion SPECT (MPI), hindering standardized clinical protocols. This simulation study, utilizing the SIMIND Monte Carlo program and a highly realistic 4D XCAT phantom, systematically evaluates Dual Energy Window (DEW, with k=0.5) and Triple Energy Window (TEW) SC techniques. We uniquely investigate their performance across various photopeak window widths (2, 4, and 6 keV) and novel overlapped/non overlapped configurations specifically for Tc 99m MPI parameters largely unexplored in realistic cardiac models. Images were reconstructed with OSEM under uncorrected (UC), SC, and combined attenuation and scatter corrected (ACSC) conditions. Quantitative analysis focused on signal to noise ratio (SNR), contrast to noise ratio (CNR), defect contrast, and relative noise to background (RNB). Our findings consistently show ACSC's superior performance in CNR, SNR, and defect contrast, confirming its critical role. Interestingly, SC alone reduced noise but compromised defect contrast relative to UC, highlighting a potential trade-off without attenuation correction. Crucially, this study reveals minimal influence of photopeak window width and overlap configuration on image quality, and no significant difference between DEW and TEW across most metrics. These results provide essential evidence for optimizing quantitative MPI protocols, suggesting that for Tc 99m, the choice between DEW and TEW, and specific window settings, may be less critical than ensuring robust attenuation correction.

17

Individualized Forecasting of Headache Attack Risk Using a Continuously Updating Model

Houle, T. T.; Lebowitz, A.; Chtay, I.; Patel, T.; McGeary, D. D.; Turner, D. P.

2026-04-22 neurology 10.64898/2026.04.20.26350119 medRxiv

Top 2%

0.2%

Show abstract

ImportanceMigraine attacks often occur unpredictably, limiting the ability of individuals to initiate timely preventive or preemptive treatment. Short-term probabilistic forecasting of migraine risk could enable more targeted management strategies. ObjectiveTo externally validate the previously developed Headache Prediction Model (HAPRED-I), evaluate an updated continuously learning model (HAPRED-II), and assess the feasibility and short-term safety of delivering individualized probabilistic migraine forecasts directly to patients. Design, Setting, and ParticipantsProspective 8-week cohort study conducted remotely at two academic medical centers in the United States (Massachusetts General Hospital and Wake Forest Health Sciences) between 2015 and 2019. Adults with recurrent migraine or tension-type headache completed twice-daily electronic diaries. A total of 230 participants contributed 23,335 diary entries across 11,862 participant-days of observation. Main Outcomes and MeasuresOccurrence of a headache attack within 24 hours following each evening diary entry. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUC]) and calibration. ResultsExternal validation of HAPRED-I demonstrated modest discrimination (AUC, 0.59; 95% CI, 0.57-0.61) and poor calibration, with predicted probabilities consistently exceeding observed headache risk. In contrast, the continuously updating HAPRED-II model demonstrated progressive improvement in predictive performance as participant-specific data accumulated. Discrimination increased from an AUC of 0.59 (95% CI, 0.57-0.61) during the first 14 days to 0.66 (95% CI, 0.63-0.70) after the first month, accompanied by improved calibration across predicted risk levels. Over the study period, 6999 individualized forecasts were delivered directly to participants. No evidence suggested that receipt of forecasts was associated with increasing headache frequency or worsening predicted headache risk trajectories. Conclusions and RelevanceA static migraine forecasting model demonstrated limited transportability to new individuals. In contrast, models that continuously update within individuals may improve predictive accuracy over time and enable real-time delivery of personalized migraine risk forecasts. Further work incorporating richer physiologic and contextual predictors will likely be necessary before such systems can reliably guide clinical treatment decisions.

18

DIVAID: Consistent division of atrial geometries from multimodal imaging according to the EHRA/EACVI 15-segment bi-atrial model

Goetz, C.; Eichenlaub, M.; Schmidt, K.; Wiedmann, F.; Invers Rubio, E.; Martinez Diaz, P.; Luik, A.; Althoff, T.; Schmidt, C.; Loewe, A.

2026-04-23 cardiovascular medicine 10.64898/2026.04.22.26351448 medRxiv

Top 2%

0.2%

Show abstract

The recently published EHRA/EACVI consensus statement on a standardized bi-atrial regionalization provides new opportunities for consistent regional analyses across patients, imaging modalities and clinical centers. To make this standardized regionalization widely accessible, we developed the open-source software DIVAID, which automatically divides bi-atrial geometries according to the proposed regions, ensuring consistency, reproducibility and operator independence. We evaluated the accuracy of the algorithm by comparing its results to manual expert annotations across 140 geometries from multiple modalities and centers. Veins were automatically clipped correctly in 81% and orifices annotated correctly in 100% of cases. The median (interquartile range; IQR) Dice similarity coefficient (DSC) for left atrial regions was 0.98 (0.96-1.00) for DIVAID-expert and 0.98 (0.94-1.00) for inter-expert comparisons. For right atrial geometries, DSC was higher for DIVAID-expert than for inter-expert comparisons at 0.90 (0.80-0.95) and 0.88 (0.74-0.94), respectively. To assess the accuracy of regional boundaries, we computed the mean average surface distance (MASD) for boundaries derived from automatic or manual annotations. The median (IQR) MASD between DIVAID and experts was 0.17 mm (0.03-0.78) and 1.93 mm (0.65-3.96) in the left and right atrium, respectively. To conclude, DIVAID robustly divides anatomically diverse bi-atrial geometries according to the 15-segment model, while outperforming cardiac experts in both speed and consistency, and demonstrating an accuracy of regional boundaries comparable to the spatial resolution of cardiac imaging modalities. By providing automated, consistent atrial regionalization, DIVAID enables large-scale, standardized regional analyses and data-driven investigation of harmonized, multi-dimensional datasets, which may advance atrial arrhythmia research and personalized treatment strategies.

19

Ethnic Disparities in Acute Stroke Presentation and Reperfusion Therapy in a Dutch Comprehensive Stroke Center

Lee, Y. X.; Hurkmans, P. V.; Arwert, H. J.; Vliet Vlieland, T. P.; van den Wijngaard, I. R.; hofs, d.; Jellema, K.

2026-04-26 neurology 10.64898/2026.04.23.26351631 medRxiv

Top 2%

0.2%

Show abstract

Objective: To assess ethnic disparities in time to hospital presentation, use of acute reperfusion therapies, and in-hospital treatment times among patients presenting with stroke in a Dutch emergency department. Methods: In this single-centre observational cohort study, we included patients with a first-ever ischemic stroke between September 2020 and September 2021. Patients were categorized by ethnicity (with or without migration background). Demographic and stroke characteristics were compared between groups. Outcomes included: rates of presentation outside therapeutic time window, acute reperfusion therapy (intravenous thrombolysis (IVT) and endovascular thrombectomy (EVT)), and, when applicable, door-to-treatment time (DTTT), with a door-to-needle time (DTNT) and door-to-groin time (DTGT) for IVT and EVT respectively. Univariable and multivariable linear and logistic regression analyses were performed, adjusted for age, sex, and NIHSS at presentation, where appropriate. Results: A total of 232 patients were included, of whom 62 (26.7%) had a migration background. These patients were younger (66.6 vs 71.2 years) and more frequently had diabetes (27.4% vs 15.9%). Sex distribution was similar (59.7% vs 60.6% male). Stroke etiology differed between groups with less cardio-embolism (4.8% vs 15.3%) and more small vessel disease (69.4% vs 48.2%) among patients with a migration background. These latter patients presented more often outside the therapeutic time window (53.2% vs 37.1%; OR 1.90; 95% CI 1.05-3.45). EVT was less frequently performed in patients with a migration background compared to those without (8.1% vs 22.4%; OR 0.28; 95% CI 0.10-0.75). There were no significant differences in treatment times (DTTT 38min vs 30min, DTNT 35min vs 26min, DTGT 64min vs 54min). Conclusion: Patients with a migration background were more likely to present outside the therapeutic time window and had a lower rate of EVT. In order to improve access for these patients, more insight into prehospital and within hospital barriers and facilitators for appropriate management are needed.

20

Running Style and Stability During Uphill Running Are Largely Preserved with Increasing Shoe Sole Thickness

Kettner, C.; Stetter, B. J.; Stein, T.

2026-04-21 bioengineering 10.64898/2026.04.16.719110 medRxiv

Top 2%

0.2%

Show abstract

Advanced footwear technology (AFT) shoes incorporate increased sole thickness and compliant midsole materials that may alter running biomechanics. While these effects have been widely studied during level running, little is known about how sole thickness influences running style and stability during uphill running. This study examined the effects of two AFT shoes differing in sole thickness (35 mm-AFT35; 50 mm-AFT50) and a traditional control shoe (27 mm-CON27) on running style and stability during uphill running. Seventeen experienced male runners performed treadmill running at a 10% incline at 6.5 and 10 km/h in three shoe conditions. Running style was assessed using duty factor, normalized step frequency, center-of-mass oscillation, vertical and leg stiffness, and lower-limb joint kinematics. Running stability was evaluated using local dynamic stability via the maximum Lyapunov exponent and detrended fluctuation analysis of stride time. Duty factor and normalized step frequency did not differ between shoes. However, AFT shoes showed greater center-of-mass oscillation (p = 0.004), lower vertical stiffness (p = 0.022) compared to CON27. Joint kinematics revealed significant shoe effects at the ankle (p = 0.001), particularly increased dorsiflexion and eversion in AFT conditions. Running stability showed only minor changes. Local dynamic stability differed at the trunk (p = 0.027), with reduced stability in AFT50 compared with CON27 (p = 0.006), while global stability remained unchanged. No shoe x speed interactions were observed for any variable. Overall, uphill running style and stability remained largely preserved across shoe conditions, suggesting that sole thickness alone had limited influence.